Robust supervised topic models under label noise

نویسندگان

چکیده

Recently, some statistical topic modeling approaches have been widely applied in the field of supervised document classification. However, there are few researches on these under label noise, which exists real-world applications. For example, many large-scale datasets collected from websites or annotated by varying quality human-workers, and then a mislabeled items. In this paper, we propose two robust models for classification problems: Smoothed Labeled LDA (SL-LDA) Adaptive (AL-LDA). SL-LDA is an extension (L-LDA), classical model. The proposed model overcomes shortcoming L-LDA, i.e., overfitting noisy labels, through Dirichlet smoothing. AL-LDA iterative optimization framework based SL-LDA. At each procedure, update prior, incorporates observed concise algorithm maximizing entropy minimizing cross-entropy principles. This method avoids identifying label, common difficulty existing noise cleaning algorithms. Quantitative experimental results completely at random (NCAR) Multiple Noisy Sources (MNS) settings demonstrate our outstanding performance labels. Specially, has significant advantages relative to state-of-the-art massive noise.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Supervised Topic Models

We introduce supervised latent Dirichlet allocation (sLDA), a statistical model of labelled documents. The model accommodates a variety of response types. We derive a maximum-likelihood procedure for parameter estimation, which relies on variational approximations to handle intractable posterior expectations. Prediction problems motivate this research: we use the fitted model to predict respons...

متن کامل

Robust Loss Functions under Label Noise for Deep Neural Networks

In many applications of classifier learning, training data suffers from label noise. Deep networks are learned using huge training data where the problem of noisy labels is particularly relevant. The current techniques proposed for learning deep networks under label noise focus on modifying the network architecture and on algorithms for estimating true labels from noisy labels. An alternate app...

متن کامل

Robust Semi-Supervised Learning through Label Aggregation

Semi-supervised learning is proposed to exploit both labeled and unlabeled data. However, as the scale of data in real world applications increases significantly, conventional semisupervised algorithms usually lead to massive computational cost and cannot be applied to large scale datasets. In addition, label noise is usually present in the practical applications due to human annotation, which ...

متن کامل

An Effective Approach for Robust Metric Learning in the Presence of Label Noise

Many algorithms in machine learning, pattern recognition, and data mining are based on a similarity/distance measure. For example, the kNN classifier and clustering algorithms such as k-means require a similarity/distance function. Also, in Content-Based Information Retrieval (CBIR) systems, we need to rank the retrieved objects based on the similarity to the query. As generic measures such as ...

متن کامل

Supervised dimension reduction with topic models

We consider supervised dimension reduction (SDR) for problems with discrete variables. Existing methods are computationally expensive, and often do not take the local structure of data into consideration when searching for a low-dimensional space. In this paper, we propose a novel framework for SDR which is (1) general and flexible so that it can be easily adapted to various unsupervised topic ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Machine Learning

سال: 2021

ISSN: ['0885-6125', '1573-0565']

DOI: https://doi.org/10.1007/s10994-021-05967-y